The GPU technology was developed for graphics processing in computer games with the purpose of offloading  calculations involved in 2D and 3D graphics from the CPU. Earlier GPUs were fixed hardware accelerators specialized to perform the most common graphical operations. These operations have the tendency to be involve the same computations on different data, and thus the GPU employs massive parallelism for the computations. As an example, consider the task of rotating an object. Each point $\bb{P}_i$ is rotated into $\bb{S}_i$:

	\begin{equation}
		\bb{S}_i = \bb{R} \cdot \bb{P}_i
		\label{eq:rotation_transform}
	\end{equation}
	
	where \bb{R} is a $2 \times 2$ rotation matrix and the same for all points. Each $\bb{S}_i$ can be evaluated individually in parallel. Fixed GPUs have evolved into programmable units with the same parallel architecture, but where the parallel operations can be programmed for each application. Allthough meant for graphics processing such as 3D shading, these programmable GPUs can be used for general purpose computations. Applications that fit the GPU architecture involve similar calculations on thousands of data elements, and GPU manufactorers such as Nvidia and AMD have recognized this potential in their products.
	
\subsubsection{Current state of GPGPU}

	NVIDIA's current generation of GPUs are based on the GT200 architecture. The GT200 is used in NVIDIA's latest products such as the GeForce GTX295 for computer games, Quadro FX5800 for graphical workstations and Tesla C1060/S1070 for high performance computing. AMD's current GPU generation is the HD 5000 series, which is used in recent products such as HD 5870. Specifications for the GT200 and HD 5000 series are given in Table \ref{table:gt200hd5000}.
		
		\begin{table}[h]
		\centering
		\caption{NVIDIA GT200 and AMD HD 5870 specifications}
		\begin{tabular}{| l l l |}
			\hline
			\textbf{GPU} & GT200 (C1060) & HD 5870 \\
			\textbf{\# of cores} & 240 & 1600 \\
			\textbf{Core frequency} & 1296 MHz & 850 MHz \\
			\textbf{Memory} & 4 GB & 1 GB  \\
			\textbf{Theoretical memory bandwidth} & 102 GB/s & 153 GB/s \\
			\textbf{Theoretical performance} & 933 GFLOPS & 2720 GFLOPS \\
			\hline
		\end{tabular}
		\label{table:gt200hd5000}
		\end{table}